| |
| | ```sh |
| | whisper --model medium.en --language English --output_format vtt --fp16 False --word_timestamps True --max_words_per_line 30 --initial_prompt "<prompt>" <input-file> |
| | ``` |
| | |
| | Offline transcription is roughly real-time (i.e., 1 hour of audio takes about 1 hour to transcribe). |
| | |
| | Models are downloaded to `~/.cache/whisper`. |
| | |
| | `whisper-cpp` is actually the one we want as it’s written in C++ and supports Core ML. Annoyingly the CLI options are different, but it seems to have more of them. Uses the same models as Vibe below. Input audio must be 16kHz 🙁 (`ffmpeg -i <input> -vn -ar 16000 <output>`). |
| | |
| | ```sh |
| | whisper-cpp --model ~/Library/Application\ Support/github.com.thewh1teagle.vibe/ggml-medium.en.bin --language en --output-vtt --max-len 150 --prompt "<prompt>" <input file> |
| | ``` |
| | |
| | **Much** faster. |
| | |
| | **Vibe** seems to be a useful cross-platform GUI implementation. Internally whisper-cpp ported to Rust. Seems to produce very short line lengths. it claims to have a CLI, but I can’t figure out how to make it work, and the “max sentence length” setting doesn’t seem to help (ohhhh, it’s measured in **characters**, not words—duh 😖). It produces not-quite-correct VTT: no WEBVTT header, and no blank line between entries. |
| | |
| | VS Code extension issues: |
| | |